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I. Introduction 


The growth in computational power and algorithm development in the past few decades has granted 
the science and engineering community the ability to simulate flows over complex geometries, thus making 
Computational Fluid Dynamics (CFD) tools indispensable in analysis and design. Currently, one of the 
pacing items limiting the utility of CFD for general problems is the prediction of unsteady turbulent flows.+ 3 
Reynolds-averaged Navier-Stokes (RANS) methods, which predict a time-invariant mean flowfield, struggle to 
provide consistent predictions when encountering even mild separation, such as the side-of-body separation at 
a wing-body junction. NASA’s Transformative Tools and Technologies project is developing both numerical 
methods and physical modeling approaches to improve the prediction of separated flows. A major focus 
of this effort is efficient methods for resolving the unsteady fluctuations occurring in these flows to provide 
valuable engineering data of the time-accurate flow field for buffet analysis, vortex shedding, etc. This 
approach encompasses unsteady RANS (URANS), large-eddy simulations (LES), and hybrid LES-RANS 
approaches such as Detached Eddy Simulations (DES). These unsteady approaches are inherently more 
expensive than traditional engineering RANS approaches, hence every effort to mitigate this cost must be 
leveraged. Arguably, the most cost-effective approach to improve the efficiency of unsteady methods is the 
optimal placement of the spatial and temporal degrees of freedom (DOF) using solution-adaptive methods. 

Under the steady-flow assumption, solution-adaptive methods have been demonstrated to be efficient 
in reducing discretization error by spatially distributing degrees of freedom according to a scalar field — 
the adaptive indicator — that assesses the importance of different regions of the domain. A robust method 
for identifying all important regions in the domain is the adjoint-weighted residual which estimates the 
discretization error in an output of interest (e.g.: lift and drag) and the elemental contribution to this error 
yields an adaptive indicator.+ Degrees of freedom are added/removed by locally refining/coarsening (h- 
adaptation) the mesh or by increasing/decreasing (p-adaptation) the scheme’s approximation order. The 
combination and the choice between these options are not trivial tasks and they have been the subjects of 
many research works.° '! Other approaches that remesh!? 4 the domain based on a metric field derived 
from adjoint-based error estimates and anisotropy measures have also been demonstrated to be efficient in 
controlling discretization error. 

A far less explored research topic is solution adaptation for accurate and efficient simulation of unsteady 
flows.14-!7 The limited number of research works in space-time adaptivity is partly due to algorithmic 
challenges to affordably simulate large-scale unsteady flows and to the difficulty of designing and implement- 
ing a robust space-time mesh adaptation strategy. Nevertheless, these challenges are worth taking as the 
two-dimensional results in Ref. 17 show that adjoint-based space-time adaptation can reduce the number of 
degrees of freedom by one to three orders of magnitude in comparison to uniform refinement, especially when 
a high level of accuracy is required. More importantly, the authors show that certain heuristic indicators 
cause the adaptive procedure to converge to the wrong output value. 

In steady problems, the cost of an adjoint solution is comparable to the cost of the primal solution with 
small variations depending on the implementation. The time dependence in unsteady problems requires the 
adjoint to be solved backwards in time and, at each time step, a residual linearization about the current 
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primal state makes the cost balance between primal and dual solutions depend highly on the implementation. 
One of the main contributors to the computational cost is forming and storing the residual Jacobian. This 
cost is twofold, as storing the Jacobian not only requires a large amount of memory but also hampers the 
computational efficiency due to memory bandwidth usage. 

In this work, we avoid these issues by implementing the transpose-Jacobian action on a vector to use 
the existing matrix-free Krylov infrastructure!® as the adjoint solver. This work extends our current solver 
capability that has been demonstrated to achieve high orders of accuracy (larger than 8**-order) on turbulent 
flows.!9:?0 We will use this work as a building block for our goal of output-based space-time hp-adaptation 
which will allow us to perform LES at significantly lower computational cost than currently. In this paper, 
we will concentrate on the adjoint solver and the adaptation mechanics will be covered in future papers. 
This abstract is organized as follows. In Section II, we describe the discretization followed by the primal 
solution strategy presented in Section III. Section IV presents the derivation of the adjoint equation and 
Section V presents preliminary results. We present the plan for the final manuscript in Section VI. 


II. Discretization 


For completeness, we briefly present the discretization of the flow equations. More details are found in 
Ref. 18. The compressible Navier-Stokes equations are written in conservative form as: 


qit+V-(f'-f")=g, (1) 


where (-) 4 denotes partial differentiation with respect to time. The conservative state vector is 


q=| pul, (2) 


where p is the density, w is the velocity vector, and E the total energy. The inviscid and viscous fluxes are 
given, respectively, by: 


pu 0 
fi = puut + pl ; and ee = T , (3) 
pull T-u-—KrpVT 


where p is the static pressure, H = E+ . is the total enthalpy, 7 the viscous stress tensor, K7 is the thermal 
conductivity, T = p/pR is the temperature, and R is the gas constant. The pressure is given by: 


p=(y-1) (pE — pu’) ; (4) 
where ¥ is the specific heat ratio. The viscous stress tensor, T, is given by: 
T= p(Vut Vu") —-XA(V-u)r, (5) 


where p is the viscosity, and \ = z i is the bulk viscosity. 
Applying a change of variables g = q(v), where v are the entropy variables: 





225 sated oy ct PE 
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v= Dp ’ (6) 
2 
P 
we rewrite the Navier-Stokes equations as: 
Aov,+AVv—V-(KVv) = g, (7) 


with symmetric Aj =q., A= f',Ao = fe and K = f\GqAo = foe 
We proceed to discretize (7) as follows. The domain, Q, is partitioned into non-overlapping elements, «, 
while the time is partitioned into intervals (time-slabs), J” = [¢",t”+"]. Define V, = {w, w|,. € [P(k x I)]°}, 
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the space-time finite-element space consisting of piece-wise polynomial functions in both space and time on 
each element. We seek a solution v € Y;, such that the weak form: 


rw,v) = SU. [erat vw TF) +09) 
+f fw Gin Fon 
+f rocer) g(t) — w(eh) att) } =o (8) 


is satisfied for all w € V,. Here f/ -n and f" -n denote numerical flux functions approximating the inviscid 
and viscous fluxes, respectively, while n is the outward pointing normal vector. In this work, the inviscid 
flux is discretized using the method of Ismail and Roe,?? while the viscous flux is discretized using the second 
method of Bassi and Rebay.?3 

The space Y;, is represented using a tensor-product basis such that on each element v is given by the 
product of Lagrange polynomials: 


v(x(€), ¢(7)) = Vigar Pagar with Di 5n1 = Gi(E1)b; (E2) Oe (Es) Or(7), (9) 


where x(€) defines a mapping from the reference cube, € C [—1,1]°, to physical space, while t(r) is the 
mapping from the reference interval [—1, 1] to the time interval [¢”,t”*1]. @; are one-dimensional Lagrange 
basis functions defined at Gauss-Legendre (GL) points, while Vj;,; are the corresponding nodal values of 
the entropy variables. The integrals in (8) are evaluated using numerical quadrature. For example: 


az |, | - (oa+ w(t! #")) 
~ {-(rew,.-q+Vew:(F -F°))) lat} Wp War We, (10) 


§pfakrTs 
where €),&q,€r,7s are one-dimensional GL quadrature points, and wp, wg, wr and ws are the associated 
quadrature weights. |J| denotes the Jacobian of the mapping from element reference space to physical space, 


Ve denotes differentiation with respect to the reference coordinate €, while 7 = €% f’ and 7" =€, 7," 
are the fluxes mapped to the local element coordinate system. In this work we use a quadrature rule with 
twice as many quadrature points as nodal points in order to reduce the quadrature error (we ensure exact 
integration of cubic nonlinearities) thereby improving the nonlinear stability of our scheme.2* ?° 

The remaining integrals appearing in (8) are evaluated in a similar manner, which may be described as 
a sequence of three steps: 


1. Evaluate the state (v) and gradient (Vev) at the quadrature points. 


~1 ~V 
2. Evaluate the source (g) and fluxes (f and f ) at the quadrature points. 
3. Multiply the source and fluxes with the basis functions (w) or gradients (Vew). 


A key requirement for efficiency at high polynomial order is the evaluation of the first and third steps using 
the sum-factorization approach,” 2° which allows the multiplication of the basis functions to be performed as 
a sequence of one-dimensional operations. This results in a residual evaluation cost which scales as O(N@*1) 
for each space-time element where JN is the solution order and d is the number of spatial-temporal dimensions 
(for unsteady 3D simulations d = 4). For a fixed number of spatial-temporal degrees of freedom (i.e. fixed 
DOF = N*) the residual evaluation scales linearly with the solution order. However for moderate solution 
orders, N = 4 — 16, the increased operation count with solution order may be offset by the use of optimized 
numerical kernels?° via SIMD vectorization. As vector length increases in future processor architectures, the 
aforementioned solution order range can further increase. 


III. Primal Solution Strategy 


Given the choice of basis functions and quadrature rule, Equation (8) represents a globally coupled system 
of nonlinear equations which need to be solved for each time-slab. For large three-dimensional simulations 
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the cost of storing the linearization for a single step of an implicit scheme may be prohibitively expensive. 
Using a space-time formulation, this storage cost is further scaled by the order of the basis used in the 
temporal direction. In general, all degrees of freedom within an element are coupled, leading to a storage 
cost which scales as O(N“) for a fixed number of degrees of freedom. We overcome the memory requirement 
limitation by using a matrix-free Newton-Krylov method. 

A restarted GMRES method is used as Krylov solver, which does not require the explicit storage of the 
Jacobian matrix.2” GMRES requires only the application of the linearization to each search direction, i.e. 
we need to compute the linearized residual in the search direction. In this work, the linearized residual is 
computed directly. As with the residual evaluation, the terms in the evaluation of the linearized residual in 
a search direction, s, are computed as a sequence of three steps: 


1. Evaluate the state (v), gradient (Vev), linearized state (s) and linearized gradient (Ves) at the quadra- 
ture points. 


. ~Li 3 3 
2. Evaluate the linearized source gli" = 225 + ae Ves and linearized fluxes (f “" = 9£5 + ae Ves) 


at the quadrature points. 
3. Multiply of the linearized source and fluxes with the basis functions (w) or gradients (Vew). 


This approach is more expensive than the finite-difference approach often used in Jacobian-free method,?® 
however it is insensitive to a step-size parameter, provides the exact linearization, and it is used to compute 
the solution of adjoint (dual) problems. Note that steps 1 and 3 are transpose of each other, hence, only the 
flux Jacobians (step 2) need to be transposed for the adjoint residual operator. 

The sum factorization approach is used in the application of steps 1 and 3, such that the cost of applying 
the linearization scales as O(NV) for a fixed number of space-time degrees of freedom. We note that with 
increasing solution order this is more efficient than storing the linearization and computing the linearized 
residual as a matrix-vector product whose cost scales as O(N“) (even if there was no cost associated with 
forming the linearization). 

Furthermore, this approach allows us to use different quadrature rules for computing the linearized and 
nonlinear residuals. In particular, we often use a lower order, or collocated, quadrature rule in the evaluation 
of the linearized residuals of our Newton-Krylov scheme. The nonlinear residual, however, is evaluated with a 
de-aliased quadrature. The use of a collocated quadrature for the linearized residual introduces a linearization 
error hence our Newton method may be viewed a inexact. 


IV. Discrete Adjoint Equation 


Consider an output 7(a@,V), where the state V satisfies the discrete residual equations R(a,V) = 0 
for a parameter set a. We seek variations of the output that remain in the space of realizable solutions to 
the discrete residual equations. Assuming the output functional is at least C!-continuous and the residual 
equations admit a single solution for a given a, output variations must be balanced by corresponding residual 
variations. We can relate these variations by forming the output Lagrangian, 


L(a, V,¥) = J(a, V) + © R(a, V) (11) 


and force its stationarity for arbitrary OV and 6a: 











Os” aR ag” OR 
a oO Vv Vv 
—_—_—_—_—_—_—_—_—_—_—_—_—_—-" —_——$S$-_s————_—_—_____—S 
adjoint equation sensitivity equation 


We solve the adjoint equation for W and use its value to compute the output sensitivity with respect to a 
via the sensitivity equation. A generalization of the sensitivity equation can be derived in variation form to 
yield the adjoint-weighted error estimate.4 An integral part of solving the adjoint equation is the transposed 
action of the residual Jacobian onto a search direction. We perform this operation in a matrix-free manner 
to prevent the computational cost from scaling super-linearly with solution order. 

As with the primal solve, we use a GMRES method which does not require the explicit storage of the 
Jacobian matrix; only the application of the adjoint operator to each search direction. As with the primal 
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solve, the adjoint residual is computed exactly. The terms in the evaluation of the adjoint residual in a 
search direction, s, are again computed in a sequence of three steps: 


1. Evaluate the state (v), gradient (Vev), linearized state (s) and linearized gradient (Ves) at the quadra- 
ture points. 





; T oT ~ Adj T me 
2. Evaluate the adjoint source (g44 = oa s+ of s) and fluxes (f-? = ae Vest ae Ves) at the 


quadrature points. 
3. Multiply the adjoint sources and fluxes with the basis functions (w) or gradients (Vew). 


We note that steps 1 and 3 are identical to those for the primal problem, while step 2 involves simply the 
transpose of the operations computed in the linearized residual. 

The Fréchet derivatives of the flux at the quadrature points from Section III are not explicitly con- 
structed in our implementation. Instead, we express them as a sequence of algebraic operations and the 
implementation of their transpose follows an approach similar to the reverse-mode automatic differentiation. 

Suppose we compute g(v) using the sequence of steps 


gn = giv) (13) 
g2 = g2(v,91) (14) 
g = 9(%,92) (15) 
Then the linearized residual in a search direction s is computed using the chain rule 
: Og 
lin L 
= = 16 
J Ov § ( ) 
; Og Og . 
lin 2 2 lin 
= 4 17 
92 av § Ogi I ( ) 
O9 i O9 
= on + wm 18 
g ag Nn aga 92 (18) 
The adjoint residual in a search direction a is simply the reverse operation: 
; O | O 
adj g adj g 
a = ——a a = 7 19 
1 Ogi 2 Ogo ( ) 
O92 Og2 
i OD fe 20 
a av a ayo + Bai (20) 
adj Ogi 
aceo += = (21) 
Ov 


Figure 1 shows that the transpose Jacobian application is slightly more expensive than the forward 
application. We intend to investigate the reason for this extra cost for the final version of this paper. 
Through optimized kernels, we can further improve the linear scaling provided by the sum factorization 
approach and achieve approximately constant cost for the linearized residual evaluation up to 16*-order.1® 

The parallelization of the adjoint solver uses the same infrastructure as the primal solver. We will describe 
the parallelization details in the final version of the paper. 


V. Preliminary Results 


We present a preliminary result for the NACA0012 airfoil at M,. = 0.5, a = 1° and Re = 5000 with a 
4th_order spatial discretization. Figure 2 shows the primal and «-force-adjoint fields for z-momentum. 

We have preliminarily verified the x-force sensitivity with respect to angle of attack with a 2°¢-order 
solution by comparing the adjoint-based sensitivity with a central-difference formula. The results for Aa = 
0.0lrad are: 


Central difference: — 0.06454802; Adjoint: — 0.006722043; difference: 0.7%. 


We suspect the difference between the two sensitivities is due to the stepsizes used in both the central- 
difference used for the forward sensitivity and in the computation of the righthand side of the adjoint 
equation. 
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Figure 1: Computational cost per degree of freedom of the forward and transpose Fréchet derivatives. 


VI. Plan for Manuscript 


For the final version of this manuscript we will present two verification cases for the adjoint implemen- 
tation. The first case will be steady laminar flow over the NACA0012 airfoil at M,, = 0.5, a = 1° and 
Re = 5000. We will perform the calculations at varying solution approximation orders and we will use the 
sensitivity equation to verify the exact discrete sensitivity of the drag and lift outputs with respect to angle 
of attack. The second case will be unsteady three-dimensional turbulent flow over the MDA30p30n airfoil at 
high angle of attack. We will verify the exact discrete sensitivity of the integrated lift over time with respect 
to a variation in angle of attack. 
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(a) N = 4 primal solution. (b) N = 4 «-force-adjoint z-momentum solution. 


Figure 2: 4**-order primal and x-force adjoint solution. 
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